Reconstructing historical populations from genealogical data: an overview of methods used for aggregating data from GEDCOM files

نویسنده

  • Corry Gellatly
چکیده

The GEDCOM file format is by far the most widely used means of exchanging genealogical data and extensive collections of these files are available online. There is a huge potential benefit for historians and other academics who are able to make use of the data contained in available GEDCOM files, as these effectively represent hundreds of thousands of hours of crowdsourced work and a considerable source of knowledge about individual families. This paper details a number of methods that are being used to clean and aggregate such genealogical data; this includes a series of steps for screening out substantially flawed files, as well as for cleaning date and place information. A group-linking method is described for identifying duplicates / linkages within a genealogical database based on comparison of family structures. This is tested alongside conventional methods (i.e. comparison of name and birth date) and an estimation of the power of the differing methods is provided. It is proposed that use of the group-linking method provides advantages over conventional methods, because this provides a way of increasing the size and timespan of datasets that may be extracted from a genealogical database with confidence that they do not contain duplicates. The method will be further improved by incorporating probabilistic record linkage techniques, which take into account the frequencies of values in the linkage arrays.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data and Methods for the Production of National Population Estimates: An Overview and Analysis of Available Metadata

Thomas Spoorenberg Translated by: Elham Fathi Statistical Center of Iran Abstract. Official population estimates can be produced using a variety of data sources and methods. These range from the direct extraction of information from continuously updated population registers to procedures for updating the status of a population enumerated previously in a periodic census. Additional sources and ...

متن کامل

An Overview of the Pathology of Historical Context in Soynas Village in Mahabad

In recent decades, Iranian villages have experienced an increasing transformation culturally and environmentally due to the social changes. In pre-modern period (about half a century ago), villages were known as the production sources and had an important social and economic credit as cities foundation. However, a number of changes occurred in the rural “lifestyle” resulting from development of...

متن کامل

Know Your Ancestors Better: Demographic Visualization for Large Genealogical Data Sets

Most genealogy software is designed to make it easy to view and edit data about individuals and their relationships to others. While this is very useful, sometimes it is desirable to view information and discover trends about hundreds or thousands of individuals all at once. We present a graphical visualization system for formulating queries and viewing their results for large populations rathe...

متن کامل

Genealogical method of urban typo-morphology with the aim of deriving pattern for providing form-based codes

Introduction: The emergence of form-based codes (FBCs), along with the familiar and near-universal rejection of conventional zoning, is a complex story, and more interesting than might first be supposed. The Codes Study generally does not track developer-driven form-based codes. The socio-economic context of form-based codes has shown positive FBC impacts on physical and environmental well-bein...

متن کامل

F-STONE: A Fast Real-Time DDOS Attack Detection Method Using an Improved Historical Memory Management

Distributed Denial of Service (DDoS) is a common attack in recent years that can deplete the bandwidth of victim nodes by flooding packets. Based on the type and quantity of traffic used for the attack and the exploited vulnerability of the target, DDoS attacks are grouped into three categories as Volumetric attacks, Protocol attacks and Application attacks. The volumetric attack, which the pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014